Residue computing device

ABSTRACT

The present invention provides a residue computing device on a Galois Field GF(2{circumflex over ( )}m), for calculating a residue R of a product of a multiplier factor X and a multiplicand Y under a modulo Z, which comprises a gate G 1  for allowing the multiplier factor X to pass therethrough when a leading bit MSB of the multiplicand Y is 1, an adder ADD for adding a temporary residue R′ and a value obtained by the passage, a gate G 2  for allowing the modulo Z to pass therethrough when a leading bit MSB of a summed value SUM of the adder is 1, and a subtractor SUB for subtracting the modulo Z from the summed value SUM of the adder when the leading bit MSB of the summed value SUM is 1, wherein a process for setting a value obtained by shifting a subtracted value of the subtractor by one bit, as the temporary residue R′ on the basis of the next clock is repeatedly performed for each clock to thereby calculate the residue R.

BACKGROUND OF THE INVENTION

[0001] The present invention relates to a residue computing device on aGalois Field, which is most suitable for a residue arithmetic operationand a power-residue arithmetic operation used in elliptic curvecryptography, etc.

[0002] While arithmetic operations defined in elliptic curves on anaffine space can provide efficient calculations with a microcomputer orthe like when the arithmetic operations form a group, particularly, afinite field (Galois Field) is taken by reside arithmetic operations,their calculations result in enormous amounts. Therefore, applicationsto cryptographs had been considered in the 1980s. It has been found thatthis type of elliptic curve cryptosystem is capable of implementingsecurity of the same degree with a key having a shorter bit length ascompared with the conventional DSA system or RSA system. Attention hasthus been given to this point of view in recent years. For example, anelliptic curve cryptosystem whose key length is 224 bits, can handlecalculation processing with a calculated amount of about 1/7 as comparedwith a RSA system whose key length is 1024 bits. Thus, the ellipticcurve cryptosystem was considered to be better-suited for an IC card,particularly a wireless IC card as the field of application using theelliptic curve cryptosystem. In the wireless IC card, the third party iscapable of easily intercepting communication data and the wireless ICcard cannot get by with avoiding encryption of the data. While thewireless IC card has the merit of being capable of passing through agate with being non-contact, it must break a cipher and authenticate itduring its short passage time. It is thus necessary to provide a residecomputing device which efficiently executes a residue arithmeticoperation or the like in the elliptic curve cryptosystem.

[0003] When it is desired to execute the residue arithmetic operation orthe like, dedicated LSI or a processor equipped with a multiplier ofabout 32 bits performs calculations by, for example, a method ofdividing a bit length of a long key every 32 bits and carrying outcalculations. An algorithm for avoiding division by preference has beenadopted for the calculations. This is a contrivance for reducing a chipsize. As the above algorithm, it has been known that a calculation timeinterval becomes short if the Montgomery method, for example.

[0004] However, such a method using a multiplier having a less number ofbits has many problems. The method is accompanied by a drawback thatsince a complex algorithm is used, the amount of calculations increases,and a clock should be unavoidably made fast from the need for thecalculations in a short period of time, thereby increasing currentconsumption. Further, since data being in the course of theircalculations must be stored in their corresponding registers or the likeand a number of the registers are used, the amount of circuitry cannotbe reduced so far.

[0005] An increase in current consumption will impose a restriction on awireless IC card, particularly, a wireless IC card of such a type thatpower is supplied in the form of external electromagnetic waves. Anincrease in the size of a chip will raise the cost of wireless IC cardssupplied in large quantities.

SUMMARY OF THE INVENTION

[0006] The present invention has been made to solve the foregoingproblems and aims to provide a residue computing device on a GaloisField, which is operated on a relatively low speed clock and mostsuitable for an arithmetic operation and a power-residue arithmeticoperation used in elliptic curve cryptography.

[0007] In order to achieve the above object, the present inventionadopts a residue look-ahead or prefetch arithmetic operation or analgorithm (hereinafter called a “residue prefetch arithmetic operation”)and employs a circuit configuration or the like for reducing the numberof operation clocks.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] While the specification concludes with claims particularlypointing out and distinctly claiming the subject matter which isregarded as the invention, it is believed that the invention, theobjects and features of the invention and further objects, features andadvantages thereof will be better understood from the followingdescription taken in connection with the accompanying drawings in which:

[0009]FIG. 1 is a principle diagram showing a residue prefetcharithmetic operation of a residue computing device according to a firstembodiment;

[0010]FIG. 2 is a diagram illustrating a normal calculation-basedprocess in FIG. 1;

[0011]FIG. 3 is a diagram showing a calculation Table according to thepresent invention shown in FIG. 1;

[0012]FIG. 4 is a circuit diagram showing the residue computing deviceshown in FIG. 1, which has been speeded up;

[0013]FIG. 5 is a principle diagram illustrating a power-residuearithmetic operation executed in a second embodiment;

[0014]FIG. 6 is a diagram showing a calculation Table of products madeevery terms by a method of successive substitution;

[0015]FIG. 7 is a diagram illustrating a calculation Table of powerresidues of X;

[0016]FIG. 8 is a circuit diagram showing a power residue computingdevice shown in FIG. 5, which has been speeded up;

[0017]FIG. 9 is a principle diagram illustrating a power-residuearithmetic operation executed in a third embodiment;

[0018]FIG. 10 is a diagram showing the relationship of magnitude betweena summed or added value SUM and the value of a modulo Z;

[0019]FIG. 11 is a diagram showing a specific calculation example of aresidue arithmetic operation;

[0020]FIG. 12 is a circuit diagram illustrating a power residuecomputing device shown in FIG. 9, which has been speeded up;

[0021]FIG. 13 is a diagram showing the result of subtraction by asubtractor shown in FIG. 12; and

[0022]FIG. 14 is a circuit diagram for calculating Z=X+Y for data X, Yand Z by use of an adder with a carry.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0023] Preferred embodiments of the present invention will hereinafterbe described in detail with reference to the accompanying drawings. Thebase of a first embodiment according to the present invention will firstbe described. FIG. 1 is a principle diagram showing a residue look-aheador prefetch arithmetic operation of a residue computing unit or deviceon a Galois Field GF(2{circumflex over ( )}m) with an mth irreduciblepolynomial as a modulo. When m=224 in FIG. 1, registers X(301), R(302),Z(303) and TP(311) respectively have a register length of 224+1 bit.Similarly, registers Y(304) and S(305) are respectively 224+1 bitleft-shift registers similarly. A gate G1(306) is an AND gate forallowing data on a 225-bit bus to pass therethrough as it is when acontrol signal is given as 1 and bringing all of the data on the 225-bitbus to 0 when the control signal is given as 0. The gate G1(306)actually comprises 225 2-input AND gates. A gate G2(307) is similar tothe above. An adder ADD(308) performs addition of data of 225 bits toone another. Since it is not necessary to take into consideration acarry upon arithmetic operation on the Galois Field GF (2{circumflexover ( )}m) in practice, the adder ADD(308) can comprise 225 2-inputEXOR elements. A subtractor SUB(309) performs subtraction of data of225-bits from one another. Since it is not necessary to take borrow intoconsideration even in this case, the subtractor SUB(309) can be renderedperfectly identical in configuration to the adder ADD(308). A wire shiftcircuit T(310) is used to increment or raise an order by one accordingto a change in wire connection. Thus, the wire shift circuit T(310) doesnot serve as a circuit in terms of its contents. Incidentally, atemporary register TP is provided to hold a temporary residue producedupon a previous clock and prevent a linkage of arithmetic operations.

[0024] A description will next be made of in what way the circuit shownin FIG. 1 is operated. In FIG. 1, a residue arithmetic operation iseffected on a multiplicand X and a multiplier factor Y under a modulo Z,so that a residue R and a quotient S can be determined. Namely, X•Y(modz) is calculated. Now, the residue prefetch arithmetic operationaccording to the present invention is started with fetching of a leadingbit (MSB) of the register Y. If the leading bit is of 1, then the gateG1 adds the contents of X to the adder ADD as it is. However, if theleading bit is of 0, then the gate G1 adds 0 thereto. Since the registerY is of the left shift register, the addition of all bits in theregister Y with a shift set for each bit will finally result inthe,calculation of the product of X and Y. However, the residue prefetcharithmetic operation according to the present invention differs from thenormal arithmetic operation in that the modulo Z is subtracted for eachaddition referred to above. An added or summed value SUM is comparedwith the modulo Z. When the summed value SUM is equal to or larger thanthe modulo Z, an arithmetic operation for subtracting the modulo Ztherefrom is performed. Namely, since a precondition for calculationresides in that the leading bit of the modulo Z is 1, the subtractor SUBsubtracts the modulo Z from the summed value SUM through the gate G2where the subtraction can be executed with MSB of the summed value SUMas 1. However, if MSB of the summed value SUM is 0, then its subtractionis not carried out. The reason why such subtraction is carried out, isto remove a multiple number of Z which does not contribute to a residue,in advance. It should be also noted that the removed multiple number ofZ makes constant contribution to the quotient S. When a k(where k =0 , ,, m−1)th bit of the register Y is being handled, the multiple number ofZ results in 2{circumflex over ( )}(m−k) times the modulo Z and theresultant value thereof is brought to LSB of the shift register S. Indoing so, k finally becomes equal to m, i.e., k=m and hence the value ofthe shift register S means the quotient. This is done to take MSB of thesummed value SUM in LSB of the shift register S as it is. The output ofthe subtractor SUB is brought to the residue register R as a temporaryresidue. When the kth bit of the register Y is being handled, thetemporary residue means that the value of 2{circumflex over ( )}(m−k)times thereof is of an actual residue. An initial value of the temporaryresidue is 0. This temporary residue needs to make digit alignment forthe purpose of the next arithmetic operation. When one clock is advancedand the following lower bit of the register Y is calculated, anarithmetic operation for precedently adding a residue thereto in advanceis carried out. This is an actual meaning that the adder ADD exists. Theresidue prefetch arithmetic operation according to the present inventionis characterized in that the temporary residue left in the previousarithmetic operation is added in advance before the next subtraction.Upon its addition, the wire shift circuit T effects digit alignment withthe next digit. It is of importance that since the wire shift circuit Tactually performs only the change in wire connection, it does not dependon a clock at all. This fact makes it useful in shortening a clocknecessary for calculations later.

[0025] Simple calculations will be illustrated by way example to achievethe understanding of the operation of the circuit shown in FIG. 1. Thisexample is a case in which when an order m in an irreduciblepolynomial=5, elements on GF(2{circumflex over ( )}5) can be representedin vectors of 6 bits.

[0026] When X=T³+T+1, Y=T{circumflex over ( )}4+T³+T²+1 andZ=T{circumflex over ( )}5+T²+1, X=(001011), Y=(011101) and Z=(100101)and a quintic polynomial expression are represented in vectors, and X•Y(modZ), i.e., (001011)•(011101){mod(100101)} is calculated.

[0027] Since the product of the left side is determined and a divisionprocess is executed upon the normal calculation, the number ofcalculations is twelve. A simple calculation needs an approximatelydouble register length 11 bits. The manner of such a calculation isillustrated in FIG. 2. From this calculation, the quotient results in(000110) and the residue results in (010001).

[0028] The residue computing device of FIG. 1 according to the presentinvention performs calculations in the following manner. Now look at theleading bit of Y.

[0029] {circle over (1)} Since a first Y=0, the input to ADD takes(000000) and R assumes an initial value (000000). Therefore, R takes(000000) even after its shift, and its sum SUM eventually results in(000000). MSB of SUM is 0 and hence the subtraction of SUB is notcarried out. Thus, a new R also results in (000000).

[0030] {circle over (2)} Since the leading bit of a second Y is given asY=1, the input to ADD takes (001011) and R still remains (000000).Therefore, a sum SUM takes (001011) and MSB of SUM takes 0. Thus, sincethe subtraction of SUB is not carried out, the value of R is updated to(001011).

[0031] {circle over (3)} Since the leading bit of a third Y is alsogiven as Y=1, the input to ADD takes (001011) and R takes (001011) thistime. However, it is one-bit shifted and the two are added together.

SUM=(001011)+(010110)=(011101)

[0032]  MSB of SUM is 0 and hence the subtraction of SUB is notperformed. Therefore, the value of R is updated to R=(011101).

[0033] {circle over (4)} The leading bit of a fourth Y is given as Y=1and let's consider one-bit shift similarly. As a result,

SUM=(001011)+(111010)=(110001)

[0034]  MSB of SUM results in 1 and the subtraction (identical to theaddition on GF(2{circumflex over ( )}m)) of SUB is executed. Thus,

R=(110001)+(100101) (010100)

[0035] {circle over (5)} The leading bit of a fifth Y is given as Y=0,the input to ADD takes (000000), and a value obtained by shifting R byone bit results in (101000).

SUM=(000000)+(101000)=(101000)

[0036]  MSB of SUM is 1 this time and the subtraction thereof from amodulo Z is carried out by SUB.

R=(101000)+(100101)=(001101)

[0037] {circle over (6)} The leading bit of a sixth Y is given as Y=1,the input to ADD takes (001011) and the sum of the input and a valueobtained after one-bit shifting of R is as follows:

SUM=(001011)+(011010)=(010001)

[0038]  Since MSB of SUM is 0, the subtraction of SUB is not executedand the post-updating R is represented as R=(010001).

[0039] The above calculations are shown in FIG. 3 as a calculationTable.

[0040] As a result, the residue R coincides with the normal calculationshown in FIG. 2. Incidentally, if MSB of SUM is taken in the shiftregister S, then the result thus taken becomes the quotient. This isbecause all of the multiple numbers of modulo Z are collected up. Aportion surrounded by a frame of the calculation Table shown in FIG. 3corresponds to the quotient S, i.e., S=(000110). This result alsocoincides with the normal calculation shown in FIG. 2.

[0041]FIG. 4 illustrates the first embodiment according to the presentinvention and is a circuit wherein the residue computing device shown inFIG. 1 is speeded up by using a residue computing device on a GaloisField GF(2{circumflex over ( )}m). In FIG. 4, residue computing devicesor units 3(112), 2(113), 1(114) and 0(115) each having the same circuitconfiguration are respectively equivalent to a core portion of theresidue computing device shown in FIG. 1 and continuously connected inthis order. The residue computing device shown in FIG. 4 takes a circuitconfiguration wherein 4 bits of a Y register (104) are collectivelyprocessed based on one clock. An X register (101), a Z register (103),and an R register (102) are identical in configuration to FIG. 1.However, the X register and the Z register are connected to all theresidue computing devices or units. This is because they are used upontheir arithmetic operations. The Y register (104) and an S register(105) are respectively 4-bit left shift registers herein. Namely, theycollectively shift 4 bits according to one clock. Upper 4 bits of the Yregister are respectively used as control inputs to AND gates (e.g.,G1(106)) of the residue computing units 3 through 0. Lower 4 bits of theS register are respectively used to store control outputs from MSB ofsummed values SUM of the residue computing units 3 through 0 and resultin inputs to AND gates (e.g., G2(107)).

[0042] The operation of the residue computing device of FIG. 4corresponding to the first embodiment according to the present inventionwill now be described in brief. Each of the residue computing devices orunits performs exactly the same operation as the residue computingdevice shown in FIG. 3 herein. However, the residue computing unitsshown in FIG. 4 differ from the residue computing device in that theresidue computing device of FIG. 1 performs a residue calculation of onestage alone and leaves its temporary residue to the next clock, whereasthe residue computing units of FIG. 4 calculate residues continuously infour stages and thereafter leave their temporary residues to the nextclock. Namely, the next-stage residue computing unit has a configurationwherein its post-calculation temporary residue is passed thereto afterhaving passed through a one-bit wire shift circuit T. This is becausedigit alignment should be also done in each case. A temporary residuecalculated continuously in four stages is stored in the R register. Thetemporary residue is stored in a temporary register TP upon the nextclock and serves so as to prevent a linkage of arithmetic operations. Itshould be noted that since the one-bit wire shift circuits T do notdepend on the clock at all when the temporary residues are calculatedcontinuously in four stages according to one clock, the four-stagecontinuous calculation is enabled. Namely, generally speaking, the orderis decremented one by one arithmetic operation and has no direct bearingon the clock. It is noted that advancing such a method of thinkingallows execution of the residue arithmetic operation even if the clockis not provided at all. When the residue computing devices are set to ann stage and n-bit shift registers are adopted, the number of clocksthereof can be handled as 1/n. A computing time interval thereof becomessufficiently short, but LSI increases in chip size correspondingly.Whether the number of the residue computing units increases to therebyshorten the computing time interval, reduce each pattern area of LSI, orgive priority to a reduction in power consumption, is a design problem.

[0043] The calculation of the quotient by the computing units must bedone in consideration of the fact that whether a contribution to thequotient is made, is determined according to whether the subtraction ofthe subtractor SUB in each residue computing unit is made, only a firstorder is lowered by a one-stage residue arithmetic operation, andconsequently the degree of its contribution is reduced by 1/2 times.Thus, the S register is also made up of a shift register for performinga 4-bit shift, based on one clock in conformity to the 4-bit shiftconfiguration of the Y register in FIG. 4.

[0044]FIG. 5 is the base of a second embodiment according to the presentinvention and is also a principle diagram showing a power-residuearithmetic operation on a Galois Field GF(2{circumflex over ( )}m). FIG.5 mainly has three large configurations. The first thereof is a circuitfor calculating the power (power of 2{circumflex over ( )}k) of anelement X on a Galois Field GF(2{circumflex over ( )}m) through the useof a power residue computing unit (421). Thus, residues of X{circumflexover ( )}(2{circumflex over ( )}k) (where k=0 , , , m−1) are calculated.The second thereof is a circuit for calculating a direct product ofterms calculated by the power residue computing unit (421) through theuse of a direct-product residue computing unit (422). When n isexpressed as n=Σak•2{circumflex over ( )}k and in binary form, a residueof X{circumflex over ( )}(Σak•2{circumflex over ( )}k) is thuscalculated. The third thereof is a circuit for determining whether wheren is expressed in binary form as described above, the direct productshould be added according to the value of its bits, through the use of aregister unit (423) for fixing up the number of power n.

[0045] The power residue computing unit (421) comprises a 225-bit xregister (401), an RX{circumflex over ( )}m register (402), a TP1register (411), a Z register (omitted from the drawing), a Y register(404) corresponding to a left shift register and a residue computingunit (424) (having the same configuration as each of the residuecomputing units shown in FIG. 4). The power residue computing unit (421)is much different from the reside computing device shown in FIG. 1 inthat a once-calculated residue RX{circumflex over ( )}m is stored in theX and Y registers again and a residue corresponding to the product ofvalues stored according to the subsequent clock is calculated. Namely,the power residue computing unit according to the present invention ischaracterized by having a circuit configuration for continuouslycalculating the product of each bit value and its own value to performan arithmetic operation for incrementing or raising the order by thepower of 2. First, the product of X and 1 is calculated, a residue ofX{circumflex over ( )}2 is then calculated based on the product of X andX, a residue of X{circumflex over ( )}4 is further calculated based onthe product of X{circumflex over ( )}2 and X{circumflex over ( )}2, anda residue of X{circumflex over ( )}8, a residue of X{circumflex over( )}16, , , , are subsequently calculated. This results in thecalculation of residues of X{circumflex over ( )}(2{circumflex over( )}k) (where k=0 , , , m−1). Incidentally, since it is not necessary tocalculate the quotient, there is no need to provide the S register shownin FIG. 1. Further, the representation of the Z register is omitted inFIG. 5.

[0046] The direct product residue computing unit (422) has a circuitconfiguration substantially identical to the residue computing unitshown in FIG. 1. A TP3 register (416), a TP2 register (417), and anRX{circumflex over ( )}n register (418) correspond to the Y register(304), TP register (311) and R register (302) respectively. A residuecomputing device (425) also has the same configuration as each of theresidue computing units shown in FIG. 4. A different point therebetweenresides in that a temporary power residue of X in the course ofcalculation is stored in the TP3 register and a direct product iscalculated. The temporary power residue of X at the calculation of thedirect product is stored in the RX{circumflex over ( )}n register andthe final power residue of X{circumflex over ( )}n is stored thereinafter the completion of its calculation. The register unit (423) forfixing up the number of power (or exponent) n comprises an n register(419) corresponding to a right shift register and a gate (420). When nis expressed as n=Σak•2{circumflex over ( )}k and in binary form, it canbe represented in vectors as n=(am−1, am−2 , , , a1, a0). If its bit isgiven as 1, then the direct product is calculated. If the bit is givenas 0, then the direct product is not calculated or the product of X and1 is calculated. When its computational process is at a stage in whichX{circumflex over ( )}(2{circumflex over ( )}k) is being calculated, LSBof the n register (419) indicates the value of a kth bit and the valuethereof controls whether the direct product at this stage should beadded to the temporary power residue of X as a control signal for thegate (420). Calculating the direct product according to the value of thebit becomes equivalent to the calculation of a residue of X{circumflexover ( )}n.

[0047] As an example of a simple arithmetic operation using the powerresidue computing device according to the present invention, a reverseor inverse element X{circumflex over ( )}(−1) of the element X on aGalois Field GF(2{circumflex over ( )}m) will be determined. The inverseelement of the Galois Field GF(2{circumflex over ( )}m) is to beoriginally determined from the original element X on a unique basis.Since the Galois Field is a finite field, the power of a given element Xresults in a residue 1 under a modulo Z. That is to say:

when X{circumflex over ( )}n≡1 (mod Z) for ∃nεZ

[0048] X{circumflex over ( )}(n−1) can be set to the inverse element ofX. The inverse element will first be calculated here by the normalmethod (method of successive substitution).

[0049] When X=T³+T+1 vector representation (001011) Z=T{circumflex over( )}5+T²+1 vector representation (100101), and Y=ΣbkT{circumflex over( )}k,

[0050] X•Y=Σbk(T{circumflex over ( )}k•X). Therefore, it can berepresented in the following manner by use of Table of FIG. 6.$\begin{matrix}{{X \cdot Y} = {{(00001){b4}} + {(10010){b3}} + {(01001){b2}} +}} \\{{{(10110){b1}} + {(01011){b0}}}} \\{{= 1}}\end{matrix}$

[0051] From the above, the next simultaneous linear equation isestablished using the fact that coefficients at terms having higherorders are 0.

[0052] b1+b3=0, b2+b0=0, b1=0, b3+b1+b0=0, b4+b2+b0=1 Consequently,b4=1, b0=b1=b2=b3=0 ∴Y=T{circumflex over ( )}4

[0053] Actually, X•Y=(T³+T+1)•T{circumflex over ( )}4=(T²+1)•Z+1=1(modZ)

[0054] This calculation is equivalent to the determination of thesolution of the simultaneous linear equation, i.e., the calculation ofan inverse matrix with a matrix calculation. Thus, it is necessary toobtain an inverse matrix of 224 bits. This is not a real calculationmethod.

[0055] The above result will next be verified by a manual calculation.This calculation is shown in a calculation Table of FIG. 7. According toit, since raising X to the 31st power yields a residue 1, the inverseelement of X is equal to X to the 30th power, and its residue results in(10000). This means that Y=T{circumflex over ( )}4, and is a result thatcoincides with the method of successive substitution.

[0056] According to the theory of the Galois Field, the digit number ofa Galois Field GF(p{circumflex over ( )}m) is p{circumflex over ( )}m−1,and X{circumflex over ( )}(p{circumflex over ( )}m−1)=1 is establishedwith respect to its element X. Substituting p=2 and m=5 therein yieldsX{circumflex over ( )}(31)=1, which also coincides with the abovecalculation. Meanwhile, 31 =2{circumflex over ( )}4+2{circumflex over( )}3+2{circumflex over ( )}2+2+1 in general in the case of 2{circumflexover ( )}m−1=2{circumflex over ( )}(m−1)+2{circumflex over ( )}(m−2)+. .. +2+1 and m=5. This is given as (011111) in vector representation.There have been provided a number of ideas for efficiently calculatingthe power of X from the regularity of such bits to thereby improve acomputing speed of the inverse element. In the present invention, theresidue of the inverse element of X can be calculated using directly theabove power residue computing device as n=(011110) without depending onsuch ideas. Temporary residues that appear in its calculation processbecome just the same result as the calculation Table of FIG. 7.

[0057]FIG. 8 illustrates the second embodiment according to the presentinvention and is a circuit wherein the power residue computing deviceshown in FIG. 5 is speeded up by using a power residue computing deviceon a Galois Field GF(2{circumflex over ( )}m). In order to reduce thenumber of clocks necessary for calculations to 1/4 as compared with FIG.4, such a configuration that residue computing devices or units arecontinuously connected sequentially in four stages, is adopted. Meansfor reaching the speeding up is identical to FIG. 4. An X register(201), a Z register (203), an RX{circumflex over ( )}m register (202), aTP1 register (211), a TP2 register (217) and an RXAn register (218) arerespectively 225-bit registers. A Y register (204), an S register (205)and a TP3 register (216) are respectively 4-bit left shift registers of225 bits. An n register (296) is a 4-bit right shift register of 225bits. A gate G3(220) is identical to a gate G1(206) or the like inconfiguration The residue computing unit (225) is also identical inconfiguration to the residue computing units 3 through 1 (212 through215).

[0058] In the circuit shown in FIG. 8, the calculation of an inverseelement of a given element X is equivalent to a case in which the numberat which the bit of n is 1, is maximum and the amount of calculations ismaximum. When m=224 and the 225-bit registers are used, 57 clocks arerequired to calculate a residue of the product of X and Y. In order tocalculate a residue of the inverse element, clocks corresponding to 225times thereof are required. When the residue computing units arecontinuously connected in n stages, (m+1)²/n clocks are generallyrequired.

[0059] A description will be made of what happens if the residuecomputing device proposed in the first embodiment of the presentinvention is expanded onto GF(p) other than GF(2{circumflex over ( )}m).In this case, addition (ADD) and subtraction (SUB) having considered acarry must be executed. Since EXOR could treat with calculationprocessing on GF(2{circumflex over ( )}m), it was not necessary to takethe carry into consideration. Whether or not the subtraction should beexecuted, is judged by using a carry (borrow) and a comparator (formaking a magnitude comparison between each summed value and Z) in placeof SUM (MSB). 225-bit-to-225-bit arithmetic operations need toadditionally calculate the carry (borrow) at high speed. It will howeverbe confirmed that how to take a circuit configuration can be handled inexactly the same fashion. Namely, the proposed circuit diagram hasextremely high general versatility.

[0060]FIG. 9 shows the base of a third embodiment of the presentinvention and is also a principle diagram at the time that a residuecomputing device on a GF(p) has adopted a residue prefetch arithmeticoperation. The residue computing circuit shown in FIG. 9 can be roughlydivided into three portions. The first portion thereof is a residue addcircuit (531), which is a circuit for adding a temporary residue R′ anda multiplicand X, comparing the result of addition SUMO and a modulo Zand subtracting the modulo Z therefrom when SUMO is greater than orequal to the modulo Z. The second portion thereof is a wire shiftcircuit (532), which is a circuit for subtracting a modulo Z from avalue shifted by a wire shift T when the value is greater than or equalto the modulo Z. The third portion thereof is a quotient calculationcircuit (533), which is a circuit for calculating a quotient for amodulo Z of a product X•Y using signals (S0 and S1) outputted as aresult of subtraction.

[0061] The operation of the add circuit (531) is basically identical toGF(2{circumflex over ( )}m). Let's look at an upper bit (MSB) of Yfirst. If it is 1, then X is added (ADD) thereto, whereas if it is 0, noaddition is executed. The other party for addition is a temporaryresidue R′ and its initial value is 0. An added or summed value (SUMO)is compared with a modulo Z, and subtraction (SUBO) is executed onlywhen the summed value is greater than or equal to the modulo Z as aresult of its comparison. Such an arithmetic operation is executed toeliminate a part (multiple number of Z) non-contributive to a residue inadvance and omit an excess or unnecessary arithmetic operation. Namely,when no borrow appears as a result of the subtraction of SUBO-Z, a flag(BNO) is set and the value thereof is latched. A signal (S0) outputtedfrom the flag controls a multiplexer (MPXO). If the output signal (S0)thereof is of 0, then a summed value (SUMO) is selected as it is. If theoutput signal (S0) is of 1, then a subtracted value subsequent to thesubtraction (SUBO) is selected and then stored in a residue register R.At the head of the next clock, the value of the residue register R isshifted to a temporary register TP. The temporary register TP isprovided to prevent a linkage of arithmetic operations.

[0062] In the wire shift circuit (532), the value of the temporaryregister TP is shifted (T) by one bit, thus resulting in an intermediatetemporary residue. This shift is done to make digit alignment for thepreparation of the next addition. Here, the post-shift value is comparedwith a modulo Z, and subtraction (SUB1) is executed only when the valueis greater than or equal to the modulo Z. This is because since a summedvalue (SUM) at the next calculation exceeds 2Z where the value becomesgreater than Z owing to the shift, the above subtraction is done toprevent such exceeding. This calculation is carried out in exactly thesame manner as in the reside add circuit. Namely, the subtract circuitSUB1, flag BNO and multiplexer MPXO correspond to SUBO, BN1 and MPX1respectively. However, data to be handled are different from one anotherand used as an output (S1) produced from the flag BN1 and an output(temporary residue R′) produced from the multiplexer MPX1. An arithmeticoperation (ADD) for adding the temporary residue (R′) in advance iscarried out again at the next clock before the subtraction (SUBO).Incidentally, the quotient can be calculated from the results ofcomparison (S0 and S1). According to this calculation method, a finalresidue R is obtained after an m+1 clock with respect to the product ofthe values of m+1 bits.

[0063] The quotient calculation circuit (533) becomes slightly complexhere. As to the output signal (S0) produced from the add circuit (531),when a k(where k=0 , , , m−1)th bit of a Y register is being handled,the multiple number of Z is 2{circumflex over ( )}(m−k) times the Z,which in turn is taken in LSB of a right shift register SUBO. On theother hand, as to the output signal (S1) produced from the wire shiftcircuit (532), when the k(where k=0 , , , m−1)th bit of the Y registeris being handled, the multiple number of Z is 2{circumflex over( )}(m−k−1) times the Z and a contribution thereof to the quotient Sresults in its half. Thus, the output signal (S1) is brought to LSB ofthe right shift register SUB1 and thereafter digit alignment is carriedout by a wire one-bit right shift circuit T{circumflex over ( )}(−1),whereby the signal is added as a contribution to the quotient S. Afterall the multiple numbers of the modulo Z have been collected up, the sumthereof means the quotient S properly.

[0064] In what case the values subtracted by the subtractors (SUB0 andSUB1) should be selected by the multiplexers, is shown in Table of FIG.10. This is a Table showing the relationship of magnitude between avalue SUM added by an adder ADD and the value of a modulo Z. When thesummed value or a post-wire shift value is equal to or greater than themodulo Z when the values are compared, the relationship in which themodulo Z is always subtracted from the summed value, is representedtherein. Values 0 in Table indicate that no subtracted values are used,i.e., the multiplexers (MPX0 and MPX1) respectively select thesubtracted values. Values 1 in Table indicate that the subtracted valuesare used, i.e., the multiplexers (MPX0 and MPX1) do not select thesubtracted values respectively.

[0065] In order to obtain understanding as to how the residue arithmeticoperations on the Galois Field GF(p) are carried out where the residueprefetch arithmetic operation according to the present invention isadopted, specific examples of calculations will be cited below.

[0066] A relationship between the respective parts in the above circuitwill first be manifested.

[0067] (1) SUMO=R′+X•Ymsb

[0068]  where R′ indicates a temporary residue from an upper or upwarddigit, and Ymsb indicates the most significant bit of the Y register.

[0069] (2) R=SUMO−SO•Z

[0070]  where SO indicates 1 when a value subtracted by SUB0 is selectedor 0 when the value is not selected, i.e., each value stored in asubtracted value register SB0, which corresponds to SBO in a calculationTable of FIG. 11.

[0071] (3) R′=R•2−S1•Z

[0072] where S1 indicates 1 when a value subtracted by SUB1 is selectedor 0 when the value is not selected, i.e., each value stored in asubtracted value register SB1, which corresponds to SB1 in thecalculation Table of FIG. 11.

[0073] (4) S=SB0+SB1/2

[0074] The division of SB1 by 2 is done to make digit alignment. Thiscorresponds to a wire one-bit left shift T.

[0075] Let's next consider how processing is advanced where X=17, Y=27and Z=37. Vector representation of Y is given as (011011).

[0076] (1) 0th clock: R′=0(initial value) and Ymsb=0

[0077] Thus, SUMO=0+X•0=0. This corresponds to a mode {circle over (1)}of FIG. 10.

[0078] R=0−0•Z=0, and R′=0•2−0•Z=0

[0079] (2) 1st clock: R′=0 and Ymsb=1

[0080] Thus, SUMO=0+X•1=17. This corresponds to the mode {circle over(1)} of FIG. 10.

[0081] R=17−0•Z=17 and R′=17•2−0•Z=34

[0082] (3) 2nd clock: R′=34 and Ymsb=1

[0083] Thus, SUMO=34+X•1=51. This corresponds to a mode {circle over(3)} of FIG. 10.

[0084] R=51−1•Z=14 and R′=14•2−0•Z=28

[0085] (4) 3rd clock: R′=28 and Ymsb=0

[0086] Thus, SUMO=28+X•0=28. This corresponds to the mode {circle over(2)} of FIG. 10.

[0087] R=28−0•Z=28 and R′=28•2−1•Z=19

[0088] (5) 4th clock: R′=19 and Ymsb=1

[0089] Thus, SUMO=19+X•1=36. This corresponds to the mode {circle over(2)} of FIG. 10.

[0090] R=36−0•Z=36 and R′=36•2−1•Z=35

[0091] (6) 5th clock: R′=35 and Ymsb=1

[0092] Thus, SUMO=35+X•1=52. This corresponds to the mode {circle over(3)} of FIG. 10.

[0093] R=52−1•Z=15

[0094] The above processing was summarized in the calculation Table ofFIG. 11.

[0095] The calculation of the quotient S: S=(001001)+(00011x)/2

[0096] S=9+3=12 . . . quotient

[0097] As a result, 17•27=459=12•37 . . . remainder 15 could beconfirmed.

[0098]FIG. 12 shows a third embodiment according to the presentinvention and is a circuit wherein the residue computing device onGF(p), which is shown in FIG. 9, has been speeded up. In the circuitshown in FIG. 9, the circuit configuration is adopted wherein theresidue add circuit (531) and the wire shift circuit (532) are connectedso as to perform calculations continuously. If consideration is given tothe fact that delay times of the adders (ADD) and subtractors (SUB0 andSUB1) used for 225 bits-to-225 bits data are relatively long, thendelays in their calculation times cannot be neglected. Thus, the thirdembodiment according to the present invention has adopted aconfiguration f0or integrally forming the residue add circuit (531) andthe wire shift circuit (532) by considering the relationship of FIG. 10in a developed form. Let's now consider a comparison between a valueequal to twice a summed value (SUM) and a modulo Z and make comparisonswith a value equal to twice the modulo Z and a value equal to threetimes the modulo Z so as to avoid appearance of a value equal toone-half the modulo Z upon its integration. This is because when thevalue equal to one-half the modulo Z is handled, the least significantbit thereof is neglected and hence their values do not add up upon theircalculations. Therefore, the value 3Z must be calculated in advance evenif the value 2Z could be implemented by the wire shift circuit. Thiscalculation has a drawback in that while it is easy, the value must bestored in a Z′ register before the calculation of a residue and henceits efforts will increase. However, if allowance is made for a chiparea, then one adder is additionally provided and thereby the Z′register can be omitted.

[0099] Referring to FIG. 12, the value (2•SUM) equal to twice the summedvalue (SUM) is produced by its corresponding wire left shift circuit T.Further, the value (2•Z) equal to twice the modulo Z is also similar.The value (3•Z) equal to three times the modulo Z is calculated inadvance as described above and stored in the Z′ register. At this time,the results of subtraction executed by the subtractors SUB0, SUB1 andSUB2 were shown in FIG. 13. In this Table, the results of subtractionare divided according to cases with respect to variables BN2, BN1 andBN0 at the time that no borrow appears. Actually possible cases arelimited to any of modes {circle over (1)}, {circle over (2)}, {circleover (3)}and {circle over (4)}. When any of them is selected, asubtracted value in its mode, i.e., 2•SUM, 2•SUM−Z, 2•SUM−2•Z or2•SUM−3•Z is stored in a temporary register TP2 as a temporary residue.The temporary register TP1 is separately provided to prevent a linkageof arithmetic operations by way of example. A final residue R is storedin its corresponding R register as a value subsequent to having passedthrough a wire right shift circuit T{circumflex over ( )}(−1). This is aprocess for making a return to a digit being handled at present. This iscarried out to restore the value set to twice by the wire left shiftcircuit T in advance to its original value. The calculation of aquotient S can be obtained by accurately evaluating a multiple number ofZ where the subtraction is made under the modulo Z. A contribution tothe quotient at the corresponding digit is equivalent to an integralmultiple of Z/2. If its multiple number is expressed in binary form,then the values of their bits result in the values of S0 and S1 as theyare. The values of S0 and S1 are brought to their corresponding LSBs ofsubtraction registers SB0 and SB1 in a quotient calculation circuit(633) as they are. The contribution to the quotient can be finallydetermined according to S=SB0+SB1/2 when S0 and S1 are set as follows:

[0100] S0=BN2_BN1BN0+BN2BN1BN0 ({circle over (3)} and {circle over (4)})

[0101] S1=BN2_BN1_BN0+BN2BN1BN0 ({circle over (2)} and {circle over(4)}).

[0102] When the processing of a clock is finished and the necessary bitsare stored in the subtraction register, the proper quotient is taken inthe corresponding S register. This is because all of multiple numbers ofthe modulo Z have been collected up. The value of the subtractionregister SO is allowed to pass through the wire one-bit right shiftcircuit T{circumflex over ( )}(−1) at this time in order to restore thecontribution to the quotient set to twice in advance to its originalstate.

[0103] When the circuit shown in FIG. 12 is compared with the circuitshown in FIG. 9, the latter has the adder corresponding to one stage,the subtractors corresponding to the two stages and the multiplexerscorresponding to the two stages, whereas the former has the addercorresponding to one stage, the subtractors corresponding to one stageand the transmission gate corresponding to one stage. Therefore, thedelay time is reduced to about half and the speeding up is definitelyachieved. However, inconvenience of pre-calculation of 3Z or a burden onthe addition of one adder is significantly suffered.

[0104] The residue computing device illustrated in the circuit diagramof FIG. 12 has many problems although it has been speeded up. The firstproblem resides in that the output of a carry becomes delayed due to theaddition (ADD). When data of m=224 bits are added together, aconsiderable delay time will be produced even if a carry look aheadcircuit is used. It is however apparent that a wait for the carrybecomes earlier in time rather than an arithmetic operation based ondivision (e.g., 32-bits by 32-bits division) made to no purpose.Instead, power consumption might be reduced by a delayed clock. Thesecond problem resides in that the currently proposed circuit is stillhigh in redundancy and increases in the amount of circuitry or circuitspace. According to a trial calculation, when the area of a chip broughtinto LSI is simply calculated, an area of about 7mm² is required under aCMOS 1.2 μm rule. It is however said that the area may preferably be 4mm² or less to prevent break-down of the chip when an IC card is bent orfolded. It is also necessary to reduce the whole amount of circuitry.

[0105] A method of reducing the area of the adder per se will now beconsidered preferentially from the above request. As a general method,may be considered, a method of increasing the number of clocks to reducethe circuit amount and using the same circuit many times. It isconsidered that, for example, 225-bit data are delimited every 32 bitsand calculated eight times in parts by use of a 32-bit adder and thelike without executing addition and subtraction of the 225-bit data asthey are. If registers are also set as a configuration using a RAM, thenthe area is significantly reduced.

[0106]FIG. 14 shows a circuit for calculating Z=X+Y with respect to225-bit data X, Y and Z by use of a 32-bit ADC (adder with a carry),which has been invented as such one example. In FIG. 14, the circuitincludes a RAMZ (641), a RAMY (642) and a RAMX (643) each used as a RAMhaving a 32-bit configuration. They are respectively used to store thevalues of variables X, Y and Z. 32-bit registers Zadr (644), Yadr (645)and Xadr (646) are respectively used to designate or specify upperaddresses of the RAMs. An AdrCount (647) is a 3-bit counter, which isused to specify lower addresses of the RAMs but is commonly used in allthe RAMs. Since the counter is of 3 bits in the present example, itperforms eight counts. The upper addresses and the lower addresses arecombined together and inputted to their corresponding decoders DecX(648) DecY (649) and DecZ (650) of the RAMs. They are also used to storethe values of the variables. The adder ADC (651) performs addition of32-bit data together with a carry. In order to use the carry generatedupon its addition for the purpose of the next calculation, the operationof inputting the value of a Count (652) to a Cin (653) must be carriedout for each clock. Incidentally, it is noted that a change in mode asto whether each RAM should be brought to writing or reading, is madeaccording to an instruction. A write signal (e.g., WR) and a read signal(e.g., RD) at this time are omitted. It is desirable to cause these RAMsto have general versatility and connect them to an internal bus BUS(654).

[0107] A circuit for a power residue computing device on a Galois FieldGF(p) is essentially identical to the circuit configuration of FIG. 5.The present circuit is different therefrom only in that the residuecomputing device must perform an arithmetic operation that has takeninto consideration a carry and borrow. A circuit configuration thereofis similar to one shown in FIG. 12.

[0108] The present invention adopts a residue prefetch arithmeticoperation system. A residue computing device and a power residuecomputing device on a Galois Field GF(2{circumflex over ( )}m) accordingto this system, and a residue computing device and a power residuecomputing device on a Galois Field GF(p) have been proposedrespectively. The residue prefetch arithmetic operation system accordingto the present invention has a circuit configuration extremely high ingeneral versatility as is apparent even from similarity of the abovecircuit group. Accordingly, the respective circuits can be shared in useby being switched by multiplexers or gates. For example, the powerresidue computing device (see FIG. 5) on the Galois Field GF(p) canshare the use of other residue computing devices (see FIGS. 9 and 12)owing to addition of some gates. If it is considered that the adder orthe like on the Galois Field GF(2{circumflex over ( )}m) can be formedof EXOR, then the power residue computing device (see FIG. 5) can bealso combined with the residue computing device and power residuecomputing device on the Calois Field GF(2{circumflex over ( )}m). Since,however, they can be easily created and configured from the circuit ofthe present invention, a reference to them will be omitted in thisdetailed description.

[0109] In the present invention, the residue prefetch arithmeticoperation system has been adopted, and the residue computing device andpower residue computing device on the Galois Field GF(2{circumflex over( )}m) according to this system, and the residue computing device andpower residue computing device on the Galois Field GF(p) could beproposed respectively. If emphasized repeatedly, then the residueprefetch arithmetic operation system according to the present inventioncan be made up of the circuit fundamental and extremely high in generalversatility as is apparent from the similarity of the circuit group.Accordingly, the contrivance of other circuit configuration is easy tobe built or incorporated therein. Further, the residue prefetcharithmetic operation system can be easily expanded to a residuearithmetic operation of elliptic curve cryptography, by extension, thefield of application such as an IC card or the like. In terms of afunction, the residue prefetch arithmetic operation system according tothe present invention is particularly excellent in that the quotient canbe directly determined. This means that the residue computing deviceaccording to the present invention shares the use of the function of adivider.

[0110] While the present invention has been described with reference tothe illustrative embodiments, this description is not intended to beconstrued in a limiting sense. Various modifications of the illustrativeembodiments, as well as other embodiments of the invention, will beapparent to those skilled in the art on reference to this description.It is therefore contemplated that the appended claims will cover anysuch modifications or embodiments as fall within the true scope of theinvention.

What is claimed is:
 1. A residue computing device on a Galois FieldGF(2{circumflex over ( )}m), for calculating a residue R of a product ofa multiplier factor X and a multiplicand Y under a modulo Z, comprising:a gate G1 for allowing the multiplier factor X to pass therethrough whena leading bit MSB of the multiplicand Y is 1; an adder ADD for adding atemporary residue R′ and a value obtained by said passage; a gate G2 forallowing the modulo Z to pass therethrough when a leading bit MSB of asummed value SUM of the adder is 1; and a subtractor SUB for subtractingthe modulo Z from the summed value SUM of the adder when the leading bitMSB of the summed value SUM is 1; wherein a process for setting a valueobtained by shifting a subtracted value of the subtractor by one bit, asthe temporary residue R′ according to the next clock is repeatedlyperformed for each clock to thereby calculate the residue R.
 2. Theresidue computing device according to claim 1, wherein the leading bitMSB of the summed value SUM of the adder is taken in a shift register Sto calculate a quotient S based on the modulo Z, of the product.
 3. Aresidue computing device on a Galois Field GF(2{circumflex over ( )}m),for calculating a residue R of a product of a multiplier factor X and amultiplicand Y under a modulo Z, comprising: residue computing unitseach including, a gate G1 for allowing the multiplier factor X to passtherethrough when a leading bit MSB of the multiplicand Y is 1; an adderADD for adding a temporary residue R′ and a value obtained by saidpassage; a gate G2 for allowing the modulo Z to pass therethrough when aleading bit MSB of a summed value SUM of the adder is 1; a subtractorSUB for subtracting the modulo Z from the summed value SUM of the adderwhen the leading bit MSB of the summed value SUM is 1; and a one-bitwire shift circuit T; wherein the residue computing units arecontinuously connected in N stages and the multiplicand Y is comprisedof an N-bit shift register; and wherein a process for setting a value ofthe final residue computing unit as a temporary residue R′ of theleading residue computing unit according to the next clock is repeatedlyperformed for each clock to thereby calculate the residue R.
 4. Theresidue computing device according to claim 3, wherein the leading bitMSB of the summed value SUM of the adder is brought into an N-bit shiftregister S to calculate a quotient S based on the modulo Z, of theproduct.
 5. A residue computing device on a Galois Field GF(2{circumflexover ( )}m) or GF(p), for calculating a residue R of the powerX{circumflex over ( )}n of a multiplier factor X under a modulo Z,comprising: a power residue computing unit for storing a temporaryresidue RX{circumflex over ( )}m corresponding to the output of theresidue computing device in both registers for the multiplier factor Xand a multiplicand Y to thereby calculate the power of 2 of themultiplier factor X; a register unit for fixing the number of power n;and a direct product residue computing unit for calculating a directproduct of terms calculated by the power residue computing unit from atemporary residue RX{circumflex over ( )}n corresponding to the outputof the residue computing device.
 6. A residue computing device on aGalois Field GF(p), for calculating a residue R of a product of amultiplier factor X and a multiplicand Y under a modulo Z, comprising: agate G1 for allowing the multiplier factor X to pass therethrough when aleading bit MSB of the multiplicand Y is 1; an adder ADD for adding atemporary residue R′ and a value obtained by said passage; a subtractorSUBO for subtracting the modulo Z from a summed value SUMO of the adder;a flag BNO set to 1 when no borrow appears as a result of saidsubtraction; a residue add circuit comprised of a multiplexer MPXO, forselecting the summed value SUMO or said subtracted value from the valueof the flag BNO; a circuit T for wire-shifting the output of themultiplexer MPXO; a subtractor SUB1 for subtracting the modulo Z from apost-wireshift value SUM1; a flag BN1 set to 1 when no borrow appears asa result of said subtraction; and a wire shift circuit comprised of amultiplexer MPX1, for selecting the summed value SUM1 or said subtractedvalue from the value of the flag BN1; wherein a process for setting avalue outputted from the multiplexer MPX1 as the temporary residue R′according to the next clock is repeatedly executed for each clock tothereby calculate the residue R.
 7. The residue computing deviceaccording to claim 6, wherein the output of the flag BNO and the outputof the flag BN1 are respectively brought to a subtracted value shiftregister SBO and a subtracted value shift register SB1 every clocks, andthe value of the shift register SB1 is shifted by one bit and added tothe value of the shift register SBO to thereby calculate a quotient Sbased on the modulo Z, of the product.
 8. A residue computing device ona Galois Field GF(p), for calculating a residue R of a product of amultiplier factor X and a multiplicand Y under a modulo Z, comprising: agate for allowing the multiplier factor X to pass therethrough when aleading bit MSB of the multiplicand Y is 1; an adder for adding atemporary residue R′ and a value obtained by said passage; a circuit forperforming a wire shift for doubling a value added by the adder;subtractors for respectively subtracting moduli Z, 2Z and 3Z from thedoubled value; and a decoder for inputting signals when borrow of thesubtractors are not produced, and outputting signals for controllingtransmission gates provided on the output sides of the respectivesubtractors; wherein a process for setting a value outputted from saideach transmission gate as the temporary residue R′ according to the nextclock is repeatedly executed for each clock to thereby calculate theresidue R.
 9. The residue computing device according to claim 8, whereinan output produced from the decoder and an output produced from thedecoder are respectively brought to a subtracted value shift registerand a subtracted value shift register every clocks, and the value of thecorresponding shift register is shifted by one bit and added to thevalue of the corresponding shift register to thereby calculate aquotient S based on the modulo Z, of the product.